transformation parameter
Synthetic-to-Real Pose Estimation with Geometric Reconstruction Qiuxia Lin 1 Kerui Gu1 Linlin Y ang 2, 3 Angela Y ao 1 1
Pose estimation is remarkably successful under supervised learning, but obtaining annotations, especially for new deployments, is costly and time-consuming. This work tackles adapting models trained on synthetic data to real-world target domains with only unlabelled data. A common approach is model fine-tuning with pseudo-labels from the target domain; yet many pseudo-labelling strategies cannot provide sufficient high-quality pose labels. This work proposes a reconstruction-based strategy as a complement to pseudo-labelling for synthetic-to-real domain adaptation. We generate the driving image by geometrically transforming a base image according to the predicted keypoints and enforce a reconstruction loss to refine the predictions. It provides a novel solution to effectively correct confident yet inaccurate keypoint locations through image reconstruction in domain adaptation. Our approach outperforms the previous state-of-the-arts by 8% for PCK on four large-scale hand and human real-world datasets. In particular, we excel on endpoints such as fingertips and head, with 7.2% and 29.9% improvements in PCK.
Geometrically Constrained and Token-Based Probabilistic Spatial Transformers
Schmidt, Johann, Stober, Sebastian
Fine-grained visual classification (FGVC) remains highly sensitive to geometric variability, where objects appear under arbitrary orientations, scales, and perspective distortions. While equivariant architectures address this issue, they typically require substantial computational resources and restrict the hypothesis space. We revisit Spatial Transformer Networks (STNs) as a canonicalization tool for transformer-based vision pipelines, emphasizing their flexibility, backbone-agnostic nature, and lack of architectural constraints. We propose a probabilistic, component-wise extension that improves robustness. Specifically, we decompose affine transformations into rotation, scaling, and shearing, and regress each component under geometric constraints using a shared localization encoder. To capture uncertainty, we model each component with a Gaussian variational posterior and perform sampling-based canonicalization during inference.A novel component-wise alignment loss leverages augmentation parameters to guide spatial alignment. Experiments on challenging moth classification benchmarks demonstrate that our method consistently improves robustness compared to other STNs.
- South America > Ecuador (0.04)
- Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)
- Europe > France (0.04)
MoNetV2: Enhanced Motion Network for Freehand 3D Ultrasound Reconstruction
Luo, Mingyuan, Yang, Xin, Yan, Zhongnuo, Cao, Yan, Zhang, Yuanji, Hu, Xindi, Wang, Jin, Ding, Haoxuan, Han, Wei, Sun, Litao, Ni, Dong
Abstract--Three-dimensional (3D) ultrasound (US) aims to provide sonographers with the spatial relationships of ana tomical structures, playing a crucial role in clinical diagnosis. R ecently, deep-learning-based freehand 3D US has made significant advancements. However, i mage-only reconstruction poses difficulties in reducing cumulat ive drift and further improving reconstruction accuracy, particula rly in scenarios involving complex motion trajectories. In this c ontext, we propose an enhanced motion network (MoNetV2) to enhance the accuracy and generalizability of reconstruction under diverse scanning velocities and tactics. First, we propose a sensor -based temporal and multi-branch structure that fuses image and mo tion information from a velocity perspective to improve image-o nly reconstruction accuracy. Second, we devise an online multi -level consistency constraint that exploits the inherent consist ency of scans to handle various scanning velocities and tactics. Th is constraint exploits both scan-level velocity consistency, path-level appearance consistency, and patch-level motion consisten cy to supervise inter-frame transformation estimation. Third, we distill an online multi-modal self-supervised strategy that lever ages the correlation between network estimation and motion informa tion to further reduce cumulative errors. Extensive experiment s clearly demonstrate that MoNetV2 surpasses existing metho ds in both reconstruction quality and generalizability perfo rmance across three large datasets. L TRASOUND (US) imaging plays an important role in clinical monitoring and diagnosis because of its non-invasiveness, real-time, and mobility [ 1 ]. This work was supported by the National Natural Science Foun dation of China (Nos. Jin Wang and Litao Sun are with the Cancer C enter, Department of Ultrasound Medicine, Zhejiang Provincial Pe ople's Hospital, Affiliated People's Hospital of Hangzhou Medical Colle ge, Hangzhou, Zhejiang, China. Wei Han is with the Department of Health Man agement Center, Qilu Hospital, Cheeloo College of Medicine, Shando ng University, Jinan, Shandong, China. Its applications span vari ous fields such as heart [ 2 ], fetus [ 3 ], breast [ 4 ], and liver [ 5 ]. Traditional 3D US imaging methods encompass mechanical, phased array, and freehand techniques. Mechanical and phas ed array imaging often suffer from specialized and expensive hardware with a limited field of view.
- Asia > China > Zhejiang Province > Hangzhou (0.44)
- Asia > China > Shandong Province (0.24)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (3 more...)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area (0.67)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
IPFed: Identity protected federated learning for user authentication
Kaga, Yosuke, Suzuki, Yusei, Takahashi, Kenta
With the development of laws and regulations related to privacy preservation, it has become difficult to collect personal data to perform machine learning. In this context, federated learning, which is distributed learning without sharing personal data, has been proposed. In this paper, we focus on federated learning for user authentication. We show that it is difficult to achieve both privacy preservation and high accuracy with existing methods. To address these challenges, we propose IPFed which is privacy-preserving federated learning using random projection for class embedding. Furthermore, we prove that IPFed is capable of learning equivalent to the state-of-the-art method. Experiments on face image datasets show that IPFed can protect the privacy of personal data while maintaining the accuracy of the state-of-the-art method.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Asia > Japan (0.04)
BreastRegNet: A Deep Learning Framework for Registration of Breast Faxitron and Histopathology Images
Golestani, Negar, Wang, Aihui, Bean, Gregory R, Rusu, Mirabela
A standard treatment protocol for breast cancer entails administering neoadjuvant therapy followed by surgical removal of the tumor and surrounding tissue. Pathologists typically rely on cabinet X-ray radiographs, known as Faxitron, to examine the excised breast tissue and diagnose the extent of residual disease. However, accurately determining the location, size, and focality of residual cancer can be challenging, and incorrect assessments can lead to clinical consequences. The utilization of automated methods can improve the histopathology process, allowing pathologists to choose regions for sampling more effectively and precisely. Despite the recognized necessity, there are currently no such methods available. Training such automated detection models require accurate ground truth labels on ex-vivo radiology images, which can be acquired through registering Faxitron and histopathology images and mapping the extent of cancer from histopathology to x-ray images. This study introduces a deep learning-based image registration approach trained on mono-modal synthetic image pairs. The models were trained using data from 50 women who received neoadjuvant chemotherapy and underwent surgery. The results demonstrate that our method is faster and yields significantly lower average landmark error ($2.1\pm1.96$ mm) over the state-of-the-art iterative ($4.43\pm4.1$ mm) and deep learning ($4.02\pm3.15$ mm) approaches. Improved performance of our approach in integrating radiology and pathology information facilitates generating large datasets, which allows training models for more accurate breast cancer detection.
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.55)
A Deep Registration Method for Accurate Quantification of Joint Space Narrowing Progression in Rheumatoid Arthritis
Wang, Haolin, Ou, Yafei, Fang, Wanxuan, Ambalathankandy, Prasoon, Goto, Naoto, Ota, Gen, Ikebe, Masayuki, Kamishima, Tamotsu
Rheumatoid arthritis (RA) is a chronic autoimmune inflammatory disease that results in progressive articular destruction and severe disability. Joint space narrowing (JSN) progression has been regarded as an important indicator for RA progression and has received sustained attention. In the diagnosis and monitoring of RA, radiology plays a crucial role to monitor joint space. A new framework for monitoring joint space by quantifying JSN progression through image registration in radiographic images has been developed. This framework offers the advantage of high accuracy, however, challenges do exist in reducing mismatches and improving reliability. In this work, a deep intra-subject rigid registration network is proposed to automatically quantify JSN progression in the early stage of RA. In our experiments, the mean-square error of Euclidean distance between moving and fixed image is 0.0031, standard deviation is 0.0661 mm, and the mismatching rate is 0.48\%. The proposed method has sub-pixel level accuracy, exceeding manual measurements by far, and is equipped with immune to noise, rotation, and scaling of joints. Moreover, this work provides loss visualization, which can aid radiologists and rheumatologists in assessing quantification reliability, with important implications for possible future clinical applications. As a result, we are optimistic that this proposed work will make a significant contribution to the automatic quantification of JSN progression in RA.
- Health & Medicine > Therapeutic Area > Rheumatology (1.00)
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
No Free Lunch in Self Supervised Representation Learning
Bendidi, Ihab, Bardes, Adrien, Cohen, Ethan, Lamiable, Alexis, Bollot, Guillaume, Genovesio, Auguste
Self-supervised representation learning in computer vision relies heavily on hand-crafted image transformations to learn meaningful and invariant features. However few extensive explorations of the impact of transformation design have been conducted in the literature. In particular, the dependence of downstream performances to transformation design has been established, but not studied in depth. In this work, we explore this relationship, its impact on a domain other than natural images, and show that designing the transformations can be viewed as a form of supervision. First, we demonstrate that not only do transformations have an effect on downstream performance and relevance of clustering, but also that each category in a supervised dataset can be impacted in a different way. Following this, we explore the impact of transformation design on microscopy images, a domain where the difference between classes is more subtle and fuzzy than in natural images. In this case, we observe a greater impact on downstream tasks performances. Finally, we demonstrate that transformation design can be leveraged as a form of supervision, as careful selection of these by a domain expert can lead to a drastic increase in performance on a given downstream task.
3D Labeling Tool
Rachwan, John, Zalaket, Charbel
Training and testing supervised object detection models require a large collection of images with ground truth labels. Labels define object classes in the image, as well as their locations, shape, and possibly other information such as pose. The labeling process has proven extremely time consuming, even with the presence of manpower. We introduce a novel labeling tool for 2D images as well as 3D triangular meshes: 3D Labeling Tool (3DLT). This is a standalone, feature-heavy and cross-platform software that does not require installation and can run on Windows, macOS and Linux-based distributions. Instead of labeling the same object on every image separately like current tools, we use depth information to reconstruct a triangular mesh from said images and label the object only once on the aforementioned mesh. We use registration to simplify 3D labeling, outlier detection to improve 2D bounding box calculation and surface reconstruction to expand labeling possibility to large point clouds. Our tool is tested against state of the art methods and it greatly surpasses them in terms of speed while preserving accuracy and ease of use.
- North America > United States > California > San Francisco County > San Francisco (0.13)
- North America > United States > California > Los Angeles County > Los Angeles (0.13)
- Europe > Austria > Vienna (0.13)
- (12 more...)
- Workflow (1.00)
- Overview (1.00)
- Research Report > Promising Solution (0.48)
- Information Technology (1.00)
- Leisure & Entertainment (0.67)
- Media > Photography (0.45)
Quantised Transforming Auto-Encoders: Achieving Equivariance to Arbitrary Transformations in Deep Networks
Jiao, Jianbo, Henriques, João F.
In this work we investigate how to achieve equivariance to input transformations in deep networks, purely from data, without being given a model of those transformations. Convolutional Neural Networks (CNNs), for example, are equivariant to image translation, a transformation that can be easily modelled (by shifting the pixels vertically or horizontally). Other transformations, such as out-of-plane rotations, do not admit a simple analytic model. We propose an auto-encoder architecture whose embedding obeys an arbitrary set of equivariance relations simultaneously, such as translation, rotation, colour changes, and many others. This means that it can take an input image, and produce versions transformed by a given amount that were not observed before (e.g. a different point of view of the same object, or a colour variation). Despite extending to many (even non-geometric) transformations, our model reduces exactly to a CNN in the special case of translation-equivariance. Equivariances are important for the interpretability and robustness of deep networks, and we demonstrate results of successful re-rendering of transformed versions of input images on several synthetic and real datasets, as well as results on object pose estimation.
- North America > United States > Oklahoma > Beaver County (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)